Search CORE

6 research outputs found

VScript: Controllable Script Generation with Visual Presentation

Author: Cahyawijaya Samuel
Cheng I-Tsun
Frieske Rita
Fung Pascale
Ishii Etsuko
Ji Ziwei
Madotto Andrea
Xu Yan
Zeng Min
Publication venue
Publication date: 13/10/2022
Field of study

In order to offer a customized script tool and inspire professional scriptwriters, we present VScript. It is a controllable pipeline that generates complete scripts, including dialogues and scene descriptions, as well as presents visually using video retrieval. With an interactive interface, our system allows users to select genres and input starting words that control the theme and development of the generated script. We adopt a hierarchical structure, which first generates the plot, then the script and its visual presentation. A novel approach is also introduced to plot-guided dialogue generation by treating it as an inverse dialogue summarization. The experiment results show that our approach outperforms the baselines on both automatic and human evaluations, especially in genre control

arXiv.org e-Print Archive

State-of-the-art generalisation research in NLP: a taxonomy and review

Author: Artetxe Mikel
Batsuren Khuyagbaatar
Christodoulopoulos Christos
Cotterell Ryan
Dankers Verna
Elazar Yanai
Frieske Rita
Giulianelli Mario
Hupkes Dieuwke
Jin Zhijing
Khalatbari Leila
Lasri Karim
Pimentel Tiago
Ryskina Maria
Saphra Naomi
Schottmann Florian
Sinclair Arabella
Sinha Koustuv
Sun Kaiser
Ulmer Dennis
Publication venue
Publication date: 06/10/2022
Field of study

The ability to generalise well is one of the primary desiderata of natural language processing (NLP). Yet, what `good generalisation' entails and how it should be evaluated is not well understood, nor are there any common standards to evaluate it. In this paper, we aim to lay the ground-work to improve both of these issues. We present a taxonomy for characterising and understanding generalisation research in NLP, we use that taxonomy to present a comprehensive map of published generalisation studies, and we make recommendations for which areas might deserve attention in the future. Our taxonomy is based on an extensive literature review of generalisation research, and contains five axes along which studies can differ: their main motivation, the type of generalisation they aim to solve, the type of data shift they consider, the source by which this data shift is obtained, and the locus of the shift within the modelling pipeline. We use our taxonomy to classify over 400 previous papers that test generalisation, for a total of more than 600 individual experiments. Considering the results of this review, we present an in-depth analysis of the current state of generalisation research in NLP, and make recommendations for the future. Along with this paper, we release a webpage where the results of our review can be dynamically explored, and which we intend to up-date as new NLP generalisation studies are published. With this work, we aim to make steps towards making state-of-the-art generalisation testing the new status quo in NLP.Comment: 35 pages of content + 53 pages of reference

arXiv.org e-Print Archive

Repository for Publications and Research Data

International Migration, Integration and Social Cohesion online publications

UvA-DARE

A taxonomy and review of generalization research in NLP

Author: Artetxe Mikel
Batsuren Khuyagbaatar
Christodoulopoulos Christos
Cotterell Ryan
Dankers Verna
Elazar Yanai
Frieske Rita
Giulianelli Mario
Hupkes Dieuwke
Jin Zhijing
Khalatbari Leila
Lasri Karim
Pimentel Tiago
Ryskina Maria
Saphra Naomi
Schottmann Florian
Sinclair Arabella
Sinha Koustuv
Sun Kaiser
Ulmer Dennis
Publication venue
Publication date: 01/10/2023
Field of study

Funding Information: We thank A. Williams, A. Joulin, E. Bruni, L. Weber, R. Kirk and S. Riedel for providing feedback on the various stages of this paper, and G. Marcus for providing detailed feedback on the final draft. We also thank the reviewers of our work for providing useful comments. We thank E. Hupkes for making the app that allows searching through references, and we thank D. Haziza and E. Takmaz for other contributions to the website. M.G. was supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement no. 819455). V.D. was supported by the UKRI Centre for Doctoral Training in Natural Language Processing, funded by the UKRI (grant no. EP/S022481/1) and the University of Edinburgh. N.S. was supported by the Hyundai Motor Company (under the project Uncertainty in Neural Sequence Modeling) and the Samsung Advanced Institute of Technology (under the project Next Generation Deep Learning: From Pattern Recognition to AI). Publisher Copyright: © 2023, The Author(s).Peer reviewedPublisher PD

Aberdeen University Research

Repository for Publications and Research Data

ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation

Author: Barezi Elham J.
Cahyawijaya Samuel
Chen Qifeng
Dai Wenliang
Frieske Rita
Fung Pascale
Liu Zihan
Lovenia Holy
Ma Xiaojuan
Shi Bertram E.
Winata Genta Indra
Xu Peng
Yan Xu
Yu Tiezheng
Publication venue
Publication date: 03/05/2022
Field of study

Code-switching is a speech phenomenon occurring when a speaker switches language during a conversation. Despite the spontaneous nature of code-switching in conversational spoken language, most existing works collect code-switching data from read speech instead of spontaneous speech. ASCEND (A Spontaneous Chinese-English Dataset) is a high-quality Mandarin Chinese-English code-switching corpus built on spontaneous multi-turn conversational dialogue sources collected in Hong Kong. We report ASCEND's design and procedure for collecting the speech data, including annotations. ASCEND consists of 10.62 hours of clean speech, collected from 23 bilingual speakers of Chinese and English. Furthermore, we conduct baseline experiments using pre-trained wav2vec 2.0 models, achieving a best performance of 22.69\% character error rate and 27.05% mixed error rate

arXiv.org e-Print Archive

State-of-the-art generalisation research in NLP: a taxonomy and review

Author: Artetxe Mikel
Batsuren Khuyagbaatar
Christodoulopoulos Christos
Cotterell Ryan
Dankers Verna
Elazar Yanai
Frieske Rita
Giulianelli Mario
Hupkes Dieuwke
Jin Zhijing
Khalatbari Leila
Lasri Karim
Pimentel Tiago
Ryskina Maria
Saphra Naomi
Schottmann Florian
Sinclair Arabella
Sinha Koustuv
Sun Kaiser
Ulmer Dennis
Publication venue: Cornell University
Publication date: 10/10/2022
Field of study

Repository for Publications and Research Data

Recommended from our members

A taxonomy and review of generalization research in NLP

Author: Artetxe Mikel
Batsuren Khuyagbaatar
Christodoulopoulos Christos
Cotterell Ryan
Dankers Verna
Elazar Yanai
Frieske Rita
Giulianelli Mario
Hupkes Dieuwke
Jin Zhijing
Khalatbari Leila
Lasri Karim
Pimentel Tiago
Ryskina Maria
Saphra Naomi
Schottmann Florian
Sinclair Arabella
Sinha Koustuv
Sun Kaiser
Ulmer Dennis
Publication venue: Nature Machine Intelligence
Publication date: 20/10/2023
Field of study

Funder: N.S. was supported by the Hyundai Motor Company (under the project Uncertainty in Neural Sequence Modeling) and the Samsung Advanced Institute of Technology (under the project Next Generation Deep Learning: From Pattern Recognition to AI).AbstractThe ability to generalize well is one of the primary desiderata for models of natural language processing (NLP), but what ‘good generalization’ entails and how it should be evaluated is not well understood. In this Analysis we present a taxonomy for characterizing and understanding generalization research in NLP. The proposed taxonomy is based on an extensive literature review and contains five axes along which generalization studies can differ: their main motivation, the type of generalization they aim to solve, the type of data shift they consider, the source by which this data shift originated, and the locus of the shift within the NLP modelling pipeline. We use our taxonomy to classify over 700 experiments, and we use the results to present an in-depth analysis that maps out the current state of generalization research in NLP and make recommendations for which areas deserve attention in the future.</jats:p

Apollo (Cambridge)